Live freelance tracking. Raw descriptions turned into structured data. Find your next tech project without the noise.
freelancer.com π‘ 2026-05-14
πΉ [Target]
π€ Client: πΈπ¦ Jeddah, Saudi Arabia Member since 2026-05-14
π° Price: $33 / hr Average bid
π© Problem: Extract text content from web pages into structured Excel format.
π¦ Existing: Not specified
Specifications:
[Target] - Extract and encode specific text elements (headlines, authors, body, dates) accurately from provided URLs.
[Method] - Use web scraping techniques to visit each URL, extract relevant text content, and ensure accuracy in formatting.
[UI/UX] - Not applicable
[Stack] - Python with libraries like BeautifulSoup or Scrapy for web scraping; pandas for data manipulation; openpyxl for Excel file handling.
[Security] - Ensure no unauthorized access to URLs or data. Use secure methods for storing and transmitting sensitive information.
[Format] - Output as a clean, well-structured .xlsx file following the provided column layout.
Workflow:
1. Receive list of URLs from client.
2. Set up web scraping environment with necessary libraries (BeautifulSoup or Scrapy).
3. Define and extract target text elements (headlines, authors, body, dates) for each URL.
4. Ensure accurate formatting (spelling, punctuation, line breaks) matches the source.
5. Maintain original order of text blocks to preserve traceability.
6. Populate extracted data into a structured Excel workbook using pandas and openpyxl.
7. Save early versions for review and adjustments as needed.
8. Conduct a quick spot-check against random sample pages for accuracy.
9. Deliver final .xlsx file for client review.